Search CORE

174 research outputs found

Evaluation of DVFS techniques on modern HPC processors and accelerators for energy-aware applications

Author: Biferale
Biferale
Biferale
Calore
Calore
Calore
Calore
Calore
Crimi
Dick
Etinski
Ge
Khabi
Lim
Mantovani
Mazouz
Peraza
Sbragaglia
Scagliarini
Succi
Sundriyal
Williams
Wittmann
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

Energy efficiency is becoming increasingly important for computing systems, in particular for large scale HPC facilities. In this work we evaluate, from an user perspective, the use of Dynamic Voltage and Frequency Scaling (DVFS) techniques, assisted by the power and energy monitoring capabilities of modern processors in order to tune applications for energy efficiency. We run selected kernels and a full HPC application on two high-end processors widely used in the HPC context, namely an NVIDIA K80 GPU and an Intel Haswell CPU. We evaluate the available trade-offs between energy-to-solution and time-to-solution, attempting a function-by-function frequency tuning. We finally estimate the benefits obtainable running the full code on a HPC multi-GPU node, with respect to default clock frequency governors. We instrument our code to accurately monitor power consumption and execution time without the need of any additional hardware, and we enable it to change CPUs and GPUs clock frequencies while running. We analyze our results on the different architectures using a simple energy-performance model, and derive a number of energy saving strategies which can be easily adopted on recent high-end HPC systems for generic applications

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Performance and Power Analysis of HPC Workloads on Heterogenous Multi-Node Clusters

Author: Calore Enrico
Mantovani Filippo
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Performance analysis tools allow application developers to identify and characterize the inefficiencies that cause performance degradation in their codes, allowing for application optimizations. Due to the increasing interest in the High Performance Computing (HPC) community towards energy-efficiency issues, it is of paramount importance to be able to correlate performance and power figures within the same profiling and analysis tools. For this reason, we present a performance and energy-efficiency study aimed at demonstrating how a single tool can be used to collect most of the relevant metrics. In particular, we show how the same analysis techniques can be applicable on different architectures, analyzing the same HPC application on a high-end and a low-power cluster. The former cluster embeds Intel Haswell CPUs and NVIDIA K80 GPUs, while the latter is made up of NVIDIA Jetson TX1 boards, each hosting an Arm Cortex-A57 CPU and an NVIDIA Tegra X1 Maxwell GPU.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] and Horizon 2020 under the Mont-Blanc projects [17], grant agreements n. 288777, 610402 and 671697. E.C. was partially founded by “Contributo 5 per mille assegnato all’Università degli Studi di Ferrara-dichiarazione dei redditi dell’anno 2014”. We thank the University of Ferrara and INFN Ferrara for the access to the COKA Cluster. We warmly thank the BSC tools group, supporting us for the smooth integration and test of our setup within Extrae and Paraver.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

Archivio istituzionale della ricerca - Università di Ferrara

Towards a portable and future-proof particle-in-cell plasma physics code

Author: Bird Robert F.
Jarvis Stephen A.
Pennycook Simon J.
Wright Steven A.
Publication venue
Publication date
Field of study

We present the first reported OpenCL implementation of EPOCH3D, an extensible particle-in-cell plasma physics code developed at the University of Warwick. We document the challenges and successes of this porting effort, and compare the performance of our implementation executing on a wide variety of hardware from multiple vendors. The focus of our work is on understanding the suitability of existing algorithms for future accelerator-based architectures, and identifying the changes necessary to achieve performance portability for particle-in-cell plasma physics codes. We achieve good levels of performance with limited changes to the algorithmic behaviour of the code. However, our results suggest that a fundamental change to EPOCH3D’s current accumulation step (and its dependency on atomic operations) is necessary in order to fully utilise the massive levels of parallelism supported by emerging parallel architectures

Warwick Research Archives Portal Repository

Accurate Energy and Performance Prediction for Frequency-Scaled GPU Kernels

Author: Cosenza Biagio
Fan Kaijie
Juurlink Ben
Publication venue
Publication date: 01/01/2020
Field of study

Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energy consumption while maximizing performance. This article focuses on modeling the energy consumption and speedup of GPU applications while using different frequency configurations. The task is not straightforward, because of the large set of possible and uniformly distributed configurations and because of the multi-objective nature of the problem, which minimizes energy consumption and maximizes performance. This article proposes a machine learning-based method to predict the best core and memory frequency configurations on GPUs for an input OpenCL kernel. The method is based on two models for speedup and normalized energy predictions over the default frequency configuration. Those are later combined into a multi-objective approach that predicts a Pareto-set of frequency configurations. Results show that our approach is very accurate at predicting extema and the Pareto set, and finds frequency configurations that dominate the default configuration in either energy or performance.DFG, 360291326, CELERITY: Innovative Modellierung für Skalierbare Verteilte Laufzeitsystem

DepositOnce

Archivio della Ricerca - Università di Salerno

Very High-Temperature Reactor (VHTR) Proliferation Resistance and Physical Protection (PR&PP)

Author: Moses David Lewis
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 01/10/2011
Field of study

This report documents the detailed background information that has been compiled to support the preparation of a much shorter white paper on the design features and fuel cycles of Very High-Temperature Reactors (VHTRs), including the proposed Next-Generation Nuclear Plant (NGNP), to identify the important proliferation resistance and physical protection (PR&PP) aspects of the proposed concepts. The shorter white paper derived from the information in this report was prepared for the Department of Energy Office of Nuclear Science and Technology for the Generation IV International Forum (GIF) VHTR Systems Steering Committee (SSC) as input to the GIF Proliferation Resistance and Physical Protection Working Group (PR&PPWG) (http://www.gen-4.org/Technology/horizontal/proliferation.htm). The short white paper was edited by the GIF VHTR SCC to address their concerns and thus may differ from the information presented in this supporting report. The GIF PR&PPWG will use the derived white paper based on this report along with other white papers on the six alternative Generation IV design concepts (http://www.gen-4.org/Technology/systems/index.htm) to employ an evaluation methodology that can be applied and will evolve from the earliest stages of design. This methodology will guide system designers, program policy makers, and external stakeholders in evaluating the response of each system, to determine each system's resistance to proliferation threats and robustness against sabotage and terrorism threats, and thereby guide future international cooperation on ensuring safeguards in the deployment of the Generation IV systems. The format and content of this report is that specified in a template prepared by the GIF PR&PPWG. Other than the level of detail, the key exception to the specified template format is the addition of Appendix C to document the history and status of coated-particle fuel reprocessing technologies, which fuel reprocessing technologies have yet to be deployed commercially and have only been demonstrated in testing at a laboratory scale

Crossref

UNT Digital Library

Scheduling and drop policies for traffic differentiation on vehicular delay-tolerant networks

Author: Farahmand Farid
Rodrigues Joel J. P. C.
Soares V.N.G.J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/09/2009
Field of study

“Copyright © [2009] IEEE. Reprinted from 17th International Conference on Software, Telecommunications & Computer Networks, 2009. SoftCOM 2009.ISBN:978-1-4244-4973-6. This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.”Vehicular Delay-Tolerant Networks (VDTNs) are a promising technology for vehicular communications, creating application scenarios that enable non-real time services with diverse performance requirements. Because of scarce network resources (e.g. bandwidth and storage capacity) and node’s short contact durations, the underlying VDTN network infrastructure must be capable of prioritizing traffic. This paper investigates several scheduling and drop policies, which can be used to implement traffic differentiation. Priority Greedy, Round Robin, and Time Threshold scheduling polices are proposed. In terms of drop policy, the message with the lowest priority and the lowest remaining time-to-live is discarded first. We evaluate their efficiency and tradeoffs, through simulation. The results presented in this paper can be used as a starting point for further studies in this research field, and give helpful guidelines for future VDTN protocol design.Part of this work has been supported by Instituto de Telecomunicações, Next Generation Networks and Applications Group (NetGNA), Portugal, in the framework of the Project VDTN@Lab, and by the Euro-NF Network of Excellence of the Seventh Framework Programme of EU

Repositório do Instituto Politécnico de Castelo Branco

Recommended from our members

Very High-Temperature Reactor (VHTR) Proliferation Resistance and Physical Protection (PR&PP)

Author: Moses David Lewis
Publication venue: Oak Ridge National Laboratory
Publication date: 01/10/2011
Field of study

UNT Digital Library

Effective Cache Apportioning for Performance Isolation Under Compiler Guidance

Author: Chatterjee Bodhisatwa
Khan Sharjeel
Pande Santosh
Publication venue
Publication date: 01/10/2022
Field of study

With a growing number of cores in modern high-performance servers, effective sharing of the last level cache (LLC) is more critical than ever. The primary agenda of such systems is to maximize performance by efficiently supporting multi-tenancy of diverse workloads. However, this could be particularly challenging to achieve in practice, because modern workloads exhibit dynamic phase behaviour, which causes their cache requirements & sensitivities to vary at finer granularities during execution. Unfortunately, existing systems are oblivious to the application phase behavior, and are unable to detect and react quickly enough to these rapidly changing cache requirements, often incurring significant performance degradation. In this paper, we propose Com-CAS, a new apportioning system that provides dynamic cache allocations for co-executing applications. Com-CAS differs from the existing cache partitioning systems by adapting to the dynamic cache requirements of applications just-in-time, as opposed to reacting, without any hardware modifications. The front-end of Com-CAS consists of compiler-analysis equipped with machine learning mechanisms to predict cache requirements, while the back-end consists of proactive scheduler that dynamically apportions LLC amongst co-executing applications leveraging Intel Cache Allocation Technology (CAT). Com-CAS's partitioning scheme utilizes the compiler-generated information across finer granularities to predict the rapidly changing dynamic application behaviors, while simultaneously maintaining data locality. Our experiments show that Com-CAS improves average weighted throughput by 15% over unpartitioned cache system, and outperforms state-of-the-art partitioning system KPart by 20%, while maintaining the worst individual application completion time degradation to meet various Service-Level Agreement (SLA) requirements

arXiv.org e-Print Archive

Parallel Model Counting with CUDA: Algorithm Engineering for Efficient Hardware Utilization

Author: Fichte Johannes K.
Hecher Markus
Roland Valentin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th International Conference on Principles and Practice of Constraint Programming (CP 2021)
Publication date: 01/01/2021
Field of study

Propositional model counting (MC) and its extensions as well as applications in the area of probabilistic reasoning have received renewed attention in recent years. As a result, also the need for quickly solving counting-based problems with automated solvers is critical for certain areas. In this paper, we present experiments evaluating various techniques in order to improve the performance of parallel model counting on general purpose graphics processing units (GPGPUs). Thereby, we mainly consider engineering efficient algorithms for model counting on GPGPUs that utilize the treewidth of a propositional formula by means of dynamic programming. The combination of our techniques results in the solver GPUSAT3, which is based on the programming framework Cuda that -compared to other frameworks- shows superior extensibility and driver support. When combining all findings of this work, we show that GPUSAT3 not only solves more instances of the recent Model Counting Competition 2020 (MCC 2020) than existing GPGPU-based systems, but also solves those significantly faster. A portfolio with one of the best solvers of MCC 2020 and GPUSAT3 solves 19% more instances than the former alone in less than half of the runtime

Dagstuhl Research Online Publication Server

Division of labour and sharing of knowledge for synchronous collaborative information retrieval

Author: Alan F. Smeaton
Bellotti
Benyon
Blackwell
Cabri
Cockburn
Colum Foley
Diamadis
Dietz
Dourish
Ellis
Foley
Foster
Gianoutsos
Greenberg
Gutwin
Han
Hansen
Harman
Ingwersen
Kuutti
Lai
Maekawa
Morris
Morris
Morris
Muller
Poltrock
Poltrock
Robertson
Romano
Root
Sharples
Smeaton
Smyth
Talja
Tang
Trigg
Twidale
Want
Whittaker
Wilson
Yao
Publication venue: 'Elsevier BV'
Publication date: 01/11/2010
Field of study

Synchronous collaborative information retrieval (SCIR) is concerned with supporting two or more users who search together at the same time in order to satisfy a shared information need. SCIR systems represent a paradigmatic shift in the way we view information retrieval, moving from an individual to a group process and as such the development of novel IR techniques is needed to support this. In this article we present what we believe are two key concepts for the development of effective SCIR namely division of labour (DoL) and sharing of knowledge (SoK). Together these concepts enable coordinated SCIR such that redundancy across group members is reduced whilst enabling each group member to benefit from the discoveries of their collaborators. In this article we outline techniques from state-of-the-art SCIR systems which support these two concepts, primarily through the provision of awareness widgets. We then outline some of our own work into system-mediated techniques for division of labour and sharing of knowledge in SCIR. Finally we conclude with a discussion on some possible future trends for these two coordination techniques

Crossref

Irish Universities

DCU Online Research Access Service