Search CORE

417 research outputs found

Using AVX2 Instruction Set to Increase Performance of High Performance Computing Code

Author: Gepner Pawel
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 19/12/2017
Field of study

In this paper we discuss new Intel instruction extensions - Intel Advance Vector Extensions 2 (AVX2) and what these bring to high performance computing (HPC). To illustrate this new systems utilizing AVX2 are evaluated to demonstrate how to effectively exploit AVX2 for HPC types of the code and expose the situation when AVX2 might not be the most effective way to increase performance

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Effective Implementation of DGEMM on Modern Multicore CPU

Author: Gepner Pawel
Gamayunov Victor
Fraser David L.
Publication venue: Published by Elsevier B.V.
Publication date: 10/02/1975
Field of study

AbstractIn this paper we will present a detailed study on tuning double-precision matrix-matrix multiplication (DGEMM) on the Intel Xeon E5-2680 CPU. We selected an optimal algorithm from the instruction set perspective as well software tools optimized for Intel Advance Vector Extensions (AVX). Our optimizations included the use of vector memory operations, and AVX instructions. Our proposed algorithm achieves a performance improvement of 33% compared to the latest results achieved using the Intel Math Kernel Library DGEMM subroutine

Elsevier - Publisher Connector

Crossref

Repositori Obert de Coneixement de l'Ajuntament de Barcelona

Evaluating the Effectiveness of a Vector-Length-Agnostic Instruction Set

Author: B Zhao
JD McCalpin
M Martineau
N Stephens
P Atkinson
S McIntosh-Smith
S McIntosh-Smith
T Deakin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/08/2020
Field of study

Crossref

Explore Bristol Research

Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures

Author: Hager Georg
Hammer Julian
Hofmann Johannes
Laukemann Jan
Wellein Gerhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/10/2018
Field of study

An accurate prediction of scheduling and execution of instruction streams is a necessary prerequisite for predicting the in-core performance behavior of throughput-bound loop kernels on out-of-order processor architectures. Such predictions are an indispensable component of analytical performance models, such as the Roofline and the Execution-Cache-Memory (ECM) model, and allow a deep understanding of the performance-relevant interactions between hardware architecture and loop code. We present the Open Source Architecture Code Analyzer (OSACA), a static analysis tool for predicting the execution time of sequential loops comprising x86 instructions under the assumption of an infinite first-level cache and perfect out-of-order scheduling. We show the process of building a machine model from available documentation and semi-automatic benchmarking, and carry it out for the latest Intel Skylake and AMD Zen micro-architectures. To validate the constructed models, we apply them to several assembly kernels and compare runtime predictions with actual measurements. Finally we give an outlook on how the method may be generalized to new architectures.Comment: 11 pages, 4 figures, 7 table

arXiv.org e-Print Archive

Crossref

Modern vector architectures for high-performance computing

Author: Poenaru Andrei
Publication venue
Publication date: 12/05/2022
Field of study

Explore Bristol Research