Search CORE

1,627 research outputs found

A Study of Energy and Locality Effects using Space-filling Curves

Author: Jahre Magnus
Meyer Jan Christian
Reissmann Nico
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/06/2016
Field of study

The cost of energy is becoming an increasingly important driver for the operating cost of HPC systems, adding yet another facet to the challenge of producing efficient code. In this paper, we investigate the energy implications of trading computation for locality using Hilbert and Morton space-filling curves with dense matrix-matrix multiplication. The advantage of these curves is that they exhibit an inherent tiling effect without requiring specific architecture tuning. By accessing the matrices in the order determined by the space-filling curves, we can trade computation for locality. The index computation overhead of the Morton curve is found to be balanced against its locality and energy efficiency, while the overhead of the Hilbert curve outweighs its improvements on our test system.Comment: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW

arXiv.org e-Print Archive

CiteSeerX

Efficient dot product over word-size finite fields

Author: Dumas Jean-Guillaume
Publication venue
Publication date: 19/04/2004
Field of study

We want to achieve efficiency for the exact computation of the dot product of two vectors over word-size finite fields. We therefore compare the practical behaviors of a wide range of implementation techniques using different representations. The techniques used include oating point representations, discrete logarithms, tabulations, Montgomery reduction, delayed modulus

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Auto-tuning compiler options for HPC

Author: Jones Jessica
Publication venue
Publication date: 04/09/2019
Field of study

OPUS

Vectorized register tiling

Author: Berna Juan Alejandro
Jiménez Castells Marta
Llaberia Griñó José M.
Publication venue
Publication date: 01/01/2012
Field of study

In the last years, there has been much effort in commercial compilers (icc, gcc) to exploit efficiently the SIMD capabilities and the memory hierarchy that the current processors offer. However, the small numbers of compilers that can automatically exploit these characteristics achieve in most cases unsatisfactory results. Therefore, the programmers often need to apply by hand the optimizations to the source code, write manually the code in assembly or use compiler built-in functions (such intrinsics) to achieve high performance. In this work, we present source-to-source transformations that help commercial compilers exploiting the memory hierarchy and generating efficient SIMD code. Results obtained on our experiments show that our solutions achieve as excellent performance as hand-optimized vendor-supplied numerical libraries (written in assembly).Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Psort: automated code tuning

Author
Publication venue
Publication date
Field of study

This thesis describes the design and implementation of an automated code tuner for psort, a fast sorting library for large datasets. Our work, motivated by the necessity of guaranteeing a high performance while keeping a low cost on the end user, provides a reusable and portable framework that can be easily extended to automatically tune virtually every portion of the source code, including code that has not yet been written. Experiments show that our system produces code which is significantly faster than original code, suggesting that psort should include it among its tools SOMMARIO Questa tesi descrive la progettazione e la realizzazione di un ottimizzatore di codice automatico per psort, una libreria di ordinamento veloce per grandi moli di dati. Il nostro lavoro, motivato dalla necessità di garantire alte prestazioni mantenendo un basso costo sull'utente finale, fornisce una infrastruttura rius- abile e portabile che può essere facilmente estesa per ottimizzare in maniera automatica virtualmente ogni porzione di codice sorgente, incluso codice che ancora non è stato scritto. Gli esperimenti mostrano che il nostro sistema pro- duce codice che è significativamente più veloce del codice originale, suggerendo che psort dovrebbe includerlo tra i suoi strument

Padua Thesis and Dissertation Archive

Learning from the Success of MPI

Author: A. Geist
A. Skjellum
C.H. Koelbel
J. Boyle
J. Cownie
J. Dongarra
J.L. Traeff
K. Krechmer
Message Passing Interface Forum
Message Passing Interface Forum MPI2
N. Carriero
O. Zaki
P.B. Hansen
R. Hempel
R.C. Whaley
R.W. Numrich
W. Gropp
W. Gropp
W.W. Carlson
Publication venue
Publication date: 01/01/2001
Field of study

The Message Passing Interface (MPI) has been extremely successful as a portable way to program high-performance parallel computers. This success has occurred in spite of the view of many that message passing is difficult and that other approaches, including automatic parallelization and directive-based parallelism, are easier to use. This paper argues that MPI has succeeded because it addresses all of the important issues in providing a parallel programming model.Comment: 12 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

UNT Digital Library