Search CORE

104 research outputs found

HPC Rankings Based on Real Applications

Author: Baker Nolan
Chadrasekaran Sunita
Eigenmann Rudolf
Jarmusch Aaron
Publication venue
Publication date: 01/08/2020
Field of study

Extended abstractPerformance benchmarks are used to stress test hardware and software of large scale computing systems. A corporation known as SPEC has developed a benchmark suite, SPEC ACCEL, consisting of test codes representative of kernels in large applications. This project ranks the published results from ACCEL based on different criteria. The goal is to prepare a ranking website for the work-in-progress real-world SPEC HPG benchmark suite, HPC2021 that will soon be released (time frame 2020-2021)

IUScholarWorks (University of Indiana)

Portable, scalable, per-core power estimation for intelligent resource management

Author: Bhadauria M
Cesati M
Gioiosa R
Goel B
McKee SA
Singh K
Publication venue: IEEE
Publication date: 01/01/2010
Field of study

Performance, power, and temperature are now all first-order design constraints. Balancing power efficiency, thermal constraints, and performance requires some means to convey data about real-time power consumption and temperature to intelligent resource managers. Resource managers can use this information to meet performance goals, maintain power budgets, and obey thermal constraints. Unfortunately, obtaining the required machine introspection is challenging. Most current chips provide no support for per-core power monitoring, and when support exists, it is not exposed to software. We present a methodology for deriving per-core power models using sampled performance counter values and temperature sensor readings. We develop application-independent models for four different (four- to eight-core) platforms, validate their accuracy, and show how they can be used to guide scheduling decisions in power-aware resource managers. Model overhead is negligible, and estimations exhibit 1.1%-5.2% per-suite median error on the NAS, SPEC OMP, and SPEC 2006 benchmarks (and 1.2%-4.4% overall)

Crossref

Chalmers Research

ART

Solving the Klein-Gordon equation using Fourier spectral methods: A benchmark test for computer performance

Author: Aseeri S.
Batrašev O.
Icardi M.
Leu B.
Li N.
Liu A.
Muite B. K.
Müller E.
Palen B.
Quell M.
Servat H.
Sheth P.
Speck R.
Van Moer M.
Vienne J.
Publication venue
Publication date: 01/01/2015
Field of study

The cubic Klein-Gordon equation is a simple but non-trivial partial differential equation whose numerical solution has the main building blocks required for the solution of many other partial differential equations. In this study, the library 2DECOMP&FFT is used in a Fourier spectral scheme to solve the Klein-Gordon equation and strong scaling of the code is examined on thirteen different machines for a problem size of 512^3. The results are useful in assessing likely performance of other parallel fast Fourier transform based programs for solving partial differential equations. The problem is chosen to be large enough to solve on a workstation, yet also of interest to solve quickly on a supercomputer, in particular for parametric studies. Unlike other high performance computing benchmarks, for this problem size, the time to solution will not be improved by simply building a bigger supercomputer.Comment: 10 page

arXiv.org e-Print Archive

OPUS

Juelich Shared Electronic Resources

Comparison of Various Virtualisation Tools for MS Windows

Author: Marek Jan
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2012
Field of study

Tato práce se zabývá testováním virtualizačních nástrojů na platformě MS Windows. Jsou zde probrány techniky virtualizace a způsoby testování výkonu jednotlivých prvků počítače. V práci jsou popsány testy, které byly použity pro měření výkonu. V závěru práce jsou interpretovány výsledky z těchto testů a je ukázáno výsledné zhodnocení jednotlivých nástrojů.This thesis deals with comparison of virtualisation tools for MS Windows. It describes techniques of virtualization, forms of mesauring computer performance and also benchmarks used to mesauring performance. At the end of this thesis are interpreted results of benchmarks and final comparation tested tools.

Digital library of Brno University of Technology

National Repository of Grey Literature

Vector-thread architecture and implementation

Author: Krashinsky Ronny (Ronny Meir), 1978-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 181-186).This thesis proposes vector-thread architectures as a performance-efficient solution for all-purpose computing. The VT architectural paradigm unifies the vector and multithreaded compute models. VT provides the programmer with a control processor and a vector of virtual processors. The control processor can use vector-fetch commands to broadcast instructions to all the VPs or each VP can use thread-fetches to direct its own control flow. A seamless intermixing of the vector and threaded control mechanisms allows a VT architecture to flexibly and compactly encode application parallelism and locality. VT architectures can efficiently exploit a wide variety of loop-level parallelism, including non-vectorizable loops with cross-iteration dependencies or internal control flow. The Scale VT architecture is an instantiation of the vector-thread paradigm designed for low-power and high-performance embedded systems. Scale includes a scalar RISC control processor and a four-lane vector-thread unit that can execute 16 operations per cycle and supports up to 128 simultaneously active virtual processor threads. Scale provides unit-stride and strided-segment vector loads and stores, and it implements cache refill/access decoupling. The Scale memory system includes a four-port, non-blocking, 32-way set-associative, 32 KB cache. A prototype Scale VT processor was implemented in 180 nm technology using an ASIC-style design flow. The chip has 7.1 million transistors and a core area of 16.6 mm2, and it runs at 260 MHz while consuming 0.4-1.1 W. This thesis evaluates Scale using a diverse selection of embedded benchmarks, including example kernels for image processing, audio processing, text and data processing, cryptography, network processing, and wireless communication.(cont.) Larger applications also include a JPEG image encoder and an IEEE 802.11 la wireless transmitter. Scale achieves high performance on a range of different types of codes, generally executing 3-11 compute operations per cycle. Unlike other architectures which improve performance at the expense of increased energy consumption, Scale is generally even more energy efficient than a scalar RISC processor.by Ronny Meir Krashinsky.Ph.D

CiteSeerX

DSpace@MIT