Search CORE

635 research outputs found

Adaptive heterogeneous parallelism for semi-empirical lattice dynamics in computational materials science.

Author: Garba Michael
Publication venue
Publication date: 30/04/2015
Field of study

With the variability in performance of the multitude of parallel environments available today, the conceptual overhead created by the need to anticipate runtime information to make design-time decisions has become overwhelming. Performance-critical applications and libraries carry implicit assumptions based on incidental metrics that are not portable to emerging computational platforms or even alternative contemporary architectures. Furthermore, the significance of runtime concerns such as makespan, energy efficiency and fault tolerance depends on the situational context. This thesis presents a case study in the application of both Mattsons prescriptive pattern-oriented approach and the more principled structured parallelism formalism to the computational simulation of inelastic neutron scattering spectra on hybrid CPU/GPU platforms. The original ad hoc implementation as well as new patternbased and structured implementations are evaluated for relative performance and scalability. Two new structural abstractions are introduced to facilitate adaptation by lazy optimisation and runtime feedback. A deferred-choice abstraction represents a unified space of alternative structural program variants, allowing static adaptation through model-specific exhaustive calibration with regards to the extrafunctional concerns of runtime, average instantaneous power and total energy usage. Instrumented queues serve as mechanism for structural composition and provide a representation of extrafunctional state that allows realisation of a market-based decentralised coordination heuristic for competitive resource allocation and the Lyapunov drift algorithm for cooperative scheduling

Open Access Institutional Repository at Robert Gordon University

Energy aware approach for HPC systems

Author: Basmadjian R.
Cappello F.
Chetsa G. L. T.
Chetsa G. L. T.
Freeh V. W.
Isci C.
Isci C.
Jarus M.
Kimura H.
Meade R. L.
Nagel W. E.
Orgerie A.‐C.
Panas T.
Rivoire S.
Shan H.
Van Der Bijl H. J.
Publication venue: 'Wiley'
Publication date: 18/04/2014
Field of study

International audienceHigh‐performance computing (HPC) systems require energy during their full life cycle from design and production to transportation to usage and recycling/dismanteling. Because of increase of ecological and cost awareness, energy performance is now a primary focus. This chapter focuses on the usage aspect of HPC and how adapted and optimized software solutions could improve energy efficiency. It provides a detailed explanation of server power consumption, and discusses the application of HPC, phase detection, and phase identification. The chapter also suggests that having the load and memory access profiles is insufficient for an effective evaluation of the power consumed by an application. The available leverages in HPC systems are also shown in detail. The chapter proposes some solutions for modeling the power consumption of servers, which allows designing power prediction models for better decision making.These approaches allow the deployment and usage of a set of available green leverages, permitting energy reduction

HAL-ENS-LYON

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Power-awareness and smart-resource management in embedded computing systems

Author: Ayala Jose L.
Campanoni Simone
Cattaneo R.
Durelli G.C.
Ferroni M.
Nacci A.
Pagan J.
Santambrogio M.D.
Vallejo M.
Zapater M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Power, Performance, and Energy Management of Heterogeneous Architectures

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: Many core modern multiprocessor systems-on-chip offers tremendous power and performance optimization opportunities by tuning thousands of potential voltage, frequency and core configurations. Applications running on these architectures are becoming increasingly complex. As the basic building blocks, which make up the application, change during runtime, different configurations may become optimal with respect to power, performance or other metrics. Identifying the optimal configuration at runtime is a daunting task due to a large number of workloads and configurations. Therefore, there is a strong need to evaluate the metrics of interest as a function of the supported configurations. This thesis focuses on two different types of modern multiprocessor systems-on-chip (SoC): Mobile heterogeneous systems and tile based Intel Xeon Phi architecture. For mobile heterogeneous systems, this thesis presents a novel methodology that can accurately instrument different types of applications with specific performance monitoring calls. These calls provide a rich set of performance statistics at a basic block level while the application runs on the target platform. The target architecture used for this work (Odroid XU3) is capable of running at 4940 different frequency and core combinations. With the help of instrumented application vast amount of characterization data is collected that provides details about performance, power and CPU state at every instrumented basic block across 19 different types of applications. The vast amount of data collected has enabled two runtime schemes. The first work provides a methodology to find optimal configurations in heterogeneous architecture using classifiers and demonstrates an average increase of 93%, 81% and 6% in performance per watt compared to the interactive, ondemand and powersave governors, respectively. The second work using same data shows a novel imitation learning framework for dynamically controlling the type, number, and the frequencies of active cores to achieve an average of 109% PPW improvement compared to the default governors. This work also presents how to accurately profile tile based Intel Xeon Phi architecture while training different types of neural networks using open image dataset on deep learning framework. The data collected allows deep exploratory analysis. It also showcases how different hardware parameters affect performance of Xeon Phi.Dissertation/ThesisMasters Thesis Engineering 201

ASU Digital Repository

DNA-inspired Scheme for Building the Energy Profile of HPC Systems

Author: Da Costa Georges
Lefevre Laurent
Pierson Jean-Marc
Stolf Patricia
Tsafack Chetsa Ghislain Landry
Publication venue: HAL CCSD
Publication date: 08/05/2012
Field of study

International audienceEnergy usage is becoming a challenge for the design of next generation large scale distributed systems. This paper explores an inno- vative approach of profiling such systems. It proposes a DNA-like solution without making any assumptions on the running applications and used hardware. This profiling based on internal counters usage and energy monitoring allows to isolate specific phases during the execution and enables some energy consumption control and energy usage prediction. First experimental validations of the system modeling are presented and analyzed

HAL-ENS-LYON

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Phase-based tuning for better utilized performance-asymmetric multicores

Author: Sondag Tyler
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2009
Field of study

The latest trend towards performance asymmetry among cores on a single chip of a multicore processor is posing new software engineering challenges for developers. A key challenge is that for effective utilization of these performance-asymmetric multicore processors, application threads must be assigned to cores such that the resource needs of a thread closely matches resource availability at the assigned core. Determining this assignment manually is tedious, error prone, and it significantly complicates software development. We contribute a transparent and fully-automatic program analysis, which we call phase-guided tuning, to solve this problem. Phase-guided tuning adapts an application to effectively utilize performance-asymmetric cores of a processor. Our technique does not require any changes in the compiler or operating system, thus it is easy to deploy in existing tool chains. It does not require any input from the programmer except the application. Furthermore, it is independent of the characteristics (performance-asymmetry) of the target multicore processor, which has two benefits. First, it avoids the need to create multiple customizations of the binary for each target architecture, and second it relieves the programmer of the burden of anticipating the target architecture. Last but not least, our technique significantly improves performance. Compared to the stock Linux scheduler, our best technique shows 215% improvement in throughput and 36% average process speedup, while maintaining fairness and with negligible overheads

Digital Repository @ Iowa State University (ISU)

Phase-based Tuning for Better Utilized Multicores

Author: Rajan Hridesh
Rajan Hridesh
Sondag Tyler
Publication venue: Iowa State University Digital Repository
Publication date: 23/01/2009
Field of study

The latest trend towards performance asymmetry among cores on a single chip of a multicore processor is posing new software engineering challenges for developers. A key challenge is that for effective utilization of these performance-asymmetric multicore processors, code sections of a program must be assigned to cores such that the resource needs of a section closely matches resource availability at the assigned core. Determining this assignment manually is tedious, error prone, and it significantly complicates software development. We contribute a transparent and fully-automatic program analysis, which we call phase-based tuning, to solve this problem. Phase-based tuning adapts an application to effectively utilize performance-asymmetric cores of a processor. Our technique does not require any changes in the compiler or operating system, thus it is easy to deploy in existing tool chains. It does not require any input from the programmer except the application. Furthermore, it is independent of the characteristics (performance-asymmetry) of the target multicore processor, which has two benefits. First, it avoids the need to create multiple customizations of the binary for each target architecture, and second it relieves the programmer of the burden of anticipating the target architecture. Last but not least, our technique significantly improves performance. Compared to the stock Linux scheduler, our best technique shows 36% average process speedup, while maintaining fairness and with negligible overheads

Digital Repository @ Iowa State University (ISU)

Power-Performance Modeling and Adaptive Management of Heterogeneous Mobile Platforms

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Nearly 60% of the world population uses a mobile phone, which is typically powered by a system-on-chip (SoC). While the mobile platform capabilities range widely, responsiveness, long battery life and reliability are common design concerns that are crucial to remain competitive. Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful SoC with numerous other resources, including display, memory, power management IC, battery and wireless modems. Furthermore, the SoC itself is a heterogeneous resource that integrates many processing elements, such as CPU cores, GPU, video, image, and audio processors. Therefore, CPU cores do not dominate the platform power consumption under many application scenarios. Competitive performance requires higher operating frequency, and leads to larger power consumption. In turn, power consumption increases the junction and skin temperatures, which have adverse effects on the device reliability and user experience. As a result, allocating the power budget among the major platform resources and temperature control have become fundamental consideration for mobile platforms. Dynamic thermal and power management algorithms address this problem by putting a subset of the processing elements or shared resources to sleep states, or throttling their frequencies. However, an adhoc approach could easily cripple the performance, if it slows down the performance-critical processing element. Furthermore, mobile platforms run a wide range of applications with time varying workload characteristics, unlike early generations, which supported only limited functionality. As a result, there is a need for adaptive power and performance management approaches that consider the platform as a whole, rather than focusing on a subset. Towards this need, our specific contributions include (a) a framework to dynamically select the Pareto-optimal frequency and active cores for the heterogeneous CPUs, such as ARM big.Little architecture, (b) a dynamic power budgeting approach for allocating optimal power consumption to the CPU and GPU using performance sensitivity models for each PE, (c) an adaptive GPU frame time sensitivity prediction model to aid power management algorithms, and (d) an online learning algorithm that constructs adaptive run-time models for non-stationary workloads.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Teenustele orienteeritud ja tõendite-teadlik mobiilne pilvearvutus

Author: Flores Huber
Publication venue
Publication date: 07/10/2015
Field of study

Arvutiteaduses on kaks kõige suuremat jõudu: mobiili- ja pilvearvutus. Kui pilvetehnoloogia pakub kasutajale keerukate ülesannete lahendamiseks salvestus- ning arvutusplatvormi, siis nutitelefon võimaldab lihtsamate ülesannete lahendamist mistahes asukohas ja mistahes ajal. Täpsemalt on mobiilseadmetel võimalik pilve võimalusi ära kasutades energiat säästa ning jagu saada kasvavast jõudluse ja ruumi vajadusest. Sellest tulenevalt on käesoleva töö peamiseks küsimuseks kuidas tuua pilveinfrastruktuur mobiilikasutajale lähemale? Antud töös uurisime kuidas mobiiltelefoni pilveteenust saab mobiilirakendustesse integreerida. Saime teada, et töö delegeerimine pilve eeldab mitmete pilve aspektide kaalumist ja integreerimist, nagu näiteks ressursimahukas töötlemine, asünkroonne suhtlus kliendiga, programmaatiline ressursside varustamine (Web APIs) ja pilvedevaheline kommunikatsioon. Nende puuduste ületamiseks lõime Mobiilse pilve vahevara Mobile Cloud Middleware (Mobile Cloud Middleware - MCM) raamistiku, mis kasutab deklaratiivset teenuste komponeerimist, et delegeerida töid mobiililt mitmetele pilvedele kasutades minimaalset andmeedastust. Teisest küljest on näidatud, et koodi teisaldamine on peamisi strateegiaid seadme energiatarbimise vähendamiseks ning jõudluse suurendamiseks. Sellegipoolest on koodi teisaldamisel miinuseid, mis takistavad selle laialdast kasutuselevõttu. Selles töös uurime lisaks, mis takistab koodi mahalaadimise kasutuselevõttu ja pakume lahendusena välja raamistiku EMCO, mis kogub seadmetelt infot koodi jooksutamise kohta erinevates kontekstides. Neid andmeid analüüsides teeb EMCO kindlaks, mis on sobivad tingimused koodi maha laadimiseks. Võrreldes kogutud andmeid, suudab EMCO järeldada, millal tuleks mahalaadimine teostada. EMCO modelleerib kogutud andmeid jaotuse määra järgi lokaalsete- ning pilvejuhtude korral. Neid jaotusi võrreldes tuletab EMCO täpsed atribuudid, mille korral mobiilirakendus peaks koodi maha laadima. Võrreldes EMCO-t teiste nüüdisaegsete mahalaadimisraamistikega, tõuseb EMCO efektiivsuse poolest esile. Lõpuks uurisime kuidas arvutuste maha laadimist ära kasutada, et täiustada kasutaja kogemust pideval mobiilirakenduse kasutamisel. Meie peamiseks motivatsiooniks, et sellist adaptiivset tööde täitmise kiirendamist pakkuda, on tagada kasutuskvaliteet (QoE), mis muutub vastavalt kasutajale, aidates seeläbi suurendada mobiilirakenduse eluiga.Mobile and cloud computing are two of the biggest forces in computer science. While the cloud provides to the user the ubiquitous computational and storage platform to process any complex tasks, the smartphone grants to the user the mobility features to process simple tasks, anytime and anywhere. Smartphones, driven by their need for processing power, storage space and energy saving are looking towards remote cloud infrastructure in order to solve these problems. As a result, the main research question of this work is how to bring the cloud infrastructure closer to the mobile user? In this thesis, we investigated how mobile cloud services can be integrated within the mobile apps. We found out that outsourcing a task to cloud requires to integrate and consider multiple aspects of the clouds, such as resource-intensive processing, asynchronous communication with the client, programmatically provisioning of resources (Web APIs) and cloud intercommunication. Hence, we proposed a Mobile Cloud Middleware (MCM) framework that uses declarative service composition to outsource tasks from the mobile to multiple clouds with minimal data transfer. On the other hand, it has been demonstrated that computational offloading is a key strategy to extend the battery life of the device and improves the performance of the mobile apps. We also investigated the issues that prevent the adoption of computational offloading, and proposed a framework, namely Evidence-aware Mobile Computational Offloading (EMCO), which uses a community of devices to capture all the possible context of code execution as evidence. By analyzing the evidence, EMCO aims to determine the suitable conditions to offload. EMCO models the evidence in terms of distributions rates for both local and remote cases. By comparing those distributions, EMCO infers the right properties to offload. EMCO shows to be more effective in comparison with other computational offloading frameworks explored in the state of the art. Finally, we investigated how computational offloading can be utilized to enhance the perception that the user has towards an app. Our main motivation behind accelerating the perception at multiple response time levels is to provide adaptive quality-of-experience (QoE), which can be used as mean of engagement strategy that increases the lifetime of a mobile app

DSpace at Tartu University Library

Phase-based tuning: better utilized performance asymmetric multicores

Author: Sondag Tyler
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

The latest trend towards performance asymmetry among cores on a single chip of a multicore processor is posing new challenges. For effective utilization of these performance-asymmetric multicore processors, code sections of a program must be assigned to cores such that the resource needs of code sections closely matches resource availability at the assigned core. Determining this assignment manually is tedious, error prone, and significantly complicates software development. To solve this problem, this thesis describes a transparent and fully-automatic process called phase-based tuning which adapts an application to effectively utilize performance-asymmetric multicores. The basic idea behind this technique is to statically compute groups of program segments which are expected to behave similarly at runtime. Then, at runtime, the behavior of a few code segments is used to infer the behavior and preferred core assignment of all similar code segments with low overhead. Compared to the stock Linux scheduler, for systems asymmetric with respect to clock frequency, a 36% average process speedup is observed, while maintaining fairness and with negligible overheads. A key component to phase-based tuning is grouping program segments with similar behavior. The importance of various similarity metrics are likely to differ for each target asymmetric multicore processor. Determining groups using too many metrics may result in a grouping that differentiates between program segments based on irrelevant properties for a target machine. Using too few metrics may cause relevant metrics to be ignored thereby considering segments with different behavior similar. Therefore, to solve this problem and enable phase-based tuning for a wide range of a performance-asymmetric multicores, this thesis also describes a new technique called lazy grouping. Lazy grouping statically (at compile and install times) groups program segments that are expected to have similar behavior. The basic idea is to use extensive compile time analysis with intelligent install time (when the target system is known) group assignment. The accuracy of lazy grouping for a wide range of machines is shown to be more than 90% for nearly all target machines and asymmetric multicores

Digital Repository @ Iowa State University (ISU)