Search CORE

146 research outputs found

Power Aware Computing on GPUs

Author: Kasichayanula Kiran Kumar
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/05/2012
Field of study

Energy and power density concerns in modern processors have led to significant computer architecture research efforts in power-aware and temperature-aware computing. With power dissipation becoming an increasingly vexing problem, power analysis of Graphical Processing Unit (GPU) and its components has become crucial for hardware and software system design. Here, we describe our technique for a coordinated measurement approach that combines real total power measurement and per-component power estimation. To identify power consumption accurately, we introduce the Activity-based Model for GPUs (AMG), from which we identify activity factors and power for microarchitectures on GPUs that will help in analyzing power tradeoffs of one component versus another using microbenchmarks. The key challenge addressed in this thesis is real-time power consumption, which can be accurately estimated using NVIDIA\u27s Management Library (NVML) through Pthreads. We validated our model using Kill-A-Watt power meter and the results are accurate within 10\%. The resulting Performance Application Programming Interface (PAPI) NVML component offers real-time total power measurements for GPUs. This thesis also compares a single NVIDIA C2075 GPU running MAGMA (Matrix Algebra on GPU and Multicore Architectures) kernels, to a 48 core AMD Istanbul CPU running LAPACK

University of Tennessee, Knoxville: Trace

Scalable Applications on Heterogeneous System Architectures: A Systematic Performance Analysis Framework

Author: Dietrich Robert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/11/2019
Field of study

The efficient parallel execution of scientific applications is a key challenge in high-performance computing (HPC). With growing parallelism and heterogeneity of compute resources as well as increasingly complex software, performance analysis has become an indispensable tool in the development and optimization of parallel programs. This thesis presents a framework for systematic performance analysis of scalable, heterogeneous applications. Based on event traces, it automatically detects the critical path and inefficiencies that result in waiting or idle time, e.g. due to load imbalances between parallel execution streams. As a prerequisite for the analysis of heterogeneous programs, this thesis specifies inefficiency patterns for computation offloading. Furthermore, an essential contribution was made to the development of tool interfaces for OpenACC and OpenMP, which enable a portable data acquisition and a subsequent analysis for programs with offload directives. At present, these interfaces are already part of the latest OpenACC and OpenMP API specification. The aforementioned work, existing preliminary work, and established analysis methods are combined into a generic analysis process, which can be applied across programming models. Based on the detection of wait or idle states, which can propagate over several levels of parallelism, the analysis identifies wasted computing resources and their root cause as well as the critical-path share for each program region. Thus, it determines the influence of program regions on the load balancing between execution streams and the program runtime. The analysis results include a summary of the detected inefficiency patterns and a program trace, enhanced with information about wait states, their cause, and the critical path. In addition, a ranking, based on the amount of waiting time a program region caused on the critical path, highlights program regions that are relevant for program optimization. The scalability of the proposed performance analysis and its implementation is demonstrated using High-Performance Linpack (HPL), while the analysis results are validated with synthetic programs. A scientific application that uses MPI, OpenMP, and CUDA simultaneously is investigated in order to show the applicability of the analysis

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Leveraging Many-Core Technology for Deterministic Neutral Particle Transport at Extreme Scale

Author: Deakin Tom
Publication venue
Publication date: 08/05/2018
Field of study

Explore Bristol Research

A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

Author: Forget Benoit Robert Yves
Gunow Geoffrey Alexander
He Tim
Siegel Andrew R.
Smith Kord S.
Tramm John Robert
Publication venue: 'Elsevier BV'
Publication date: 19/04/2018
Field of study

In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures. Keywords: Method of Characteristics; Neutron transport; Reactor simulation; High performance computingUnited States. Department of Energy (Contract DE-AC02-06CH11357

DSpace@MIT

LEONARDO: A Pan-European Pre-Exascale Supercomputer for HPC and AI Applications

Author: Amati Giorgio
Cestari Mirko
Turisini Matteo
Publication venue
Publication date: 31/07/2023
Field of study

A new pre-exascale computer cluster has been designed to foster scientific progress and competitive innovation across European research systems, it is called LEONARDO. This paper describes the general architecture of the system and focuses on the technologies adopted for its GPU-accelerated partition. High density processing elements, fast data movement capabilities and mature software stack collections allow the machine to run intensive workloads in a flexible and scalable way. Scientific applications from traditional High Performance Computing (HPC) as well as emerging Artificial Intelligence (AI) domains can benefit from this large apparatus in terms of time and energy to solution.Comment: 16 pages, 5 figures, 7 tables, to be published in Journal of Large Scale Research Facilitie

arXiv.org e-Print Archive

A Quantitative Study of Advanced Encryption Standard Performance as it Relates to Cryptographic Attack Feasibility

Author: Hawthorne Daniel
Publication venue: USMA Digital Commons
Publication date: 01/12/2018
Field of study

The advanced encryption standard (AES) is the premier symmetric key cryptosystem in use today. Given its prevalence, the security provided by AES is of utmost importance. Technology is advancing at an incredible rate, in both capability and popularity, much faster than its rate of advancement in the late 1990s when AES was selected as the replacement standard for DES. Although the literature surrounding AES is robust, most studies fall into either theoretical or practical yet infeasible. This research takes the unique approach drawn from the performance field and dual nature of AES performance. It uses benchmarks to assess the performance potential of computer systems for both general purpose and AES. Since general performance information is readily available, the ratio may be used as a predictor for AES performance and consequently attack potential. The design involved distributing USB drives to facilitators containing a bootable Linux operating system and the benchmark instruments. Upon boot, these devices conducted the benchmarks, gathered system specifications, and submitted them to a server for regression analysis. Although it is likely to be many years in the future, the results of this study may help better predict when attacks against AES key lengths will become feasible

USMA Digital Commons (United States Military Academy, West Point)