Search CORE

59 research outputs found

Accurate Cache and TLB Characterization Using Hardware Counters

Author: R.C. Whaley
R.H. Saavedra
S. Browne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

UPGRO Hidden Crisis Research consortium: unravelling past failures for future success in rural water supply: initial project approach for assessing rural water supply functionality and levels of performance

Author: Bonsor H.C.
Carter R.C.
Casey V.
MacDonald A.M.
Whaley L.
Wilson P.
Publication venue: British Geological Survey
Publication date: 01/01/2016
Field of study

The new Sustainable Development Goals (SDGs) set a much stronger focus on sustainability and performance of water services, and have highly ambitious goals to achieve universal access to safe and reliable water for all by 2030 (UN 2013 ). Poor functionality of water points threatens to undermine progress, and a lack of knowledge for the reasons behind this make it difficult to recommend improvements and take corrective action. As a first step it is necessary to be able to reliably monitor current rates of functionality and to have a clear benchmark as to what constitutes a functional water point. Currently, there is no single accepted definition for functionality, although organisations are working towards this as a means of tracking progress towards the SDGs. This report sets out the initial work by the Hidden Crisis project to develop a framework approach to assess functionality in terms of different levels of performance, and a set of standard indicators which can be used to assess functionality. The report presents the results of a literature review examining the following questions: (1) what are the current approaches to defining functionality of hand-pump boreholes; and (2) what are the robust standards by which the functionality of a HPB, or population of HPB’s, can be assessed. From analyses of the literature we have developed preliminary guidelines and applied these to develop a preliminary framework

NERC Open Research Archive

Learning from the Success of MPI

Author: A. Geist
A. Skjellum
C.H. Koelbel
J. Boyle
J. Cownie
J. Dongarra
J.L. Traeff
K. Krechmer
Message Passing Interface Forum
Message Passing Interface Forum MPI2
N. Carriero
O. Zaki
P.B. Hansen
R. Hempel
R.C. Whaley
R.W. Numrich
W. Gropp
W. Gropp
W.W. Carlson
Publication venue
Publication date: 01/01/2001
Field of study

The Message Passing Interface (MPI) has been extremely successful as a portable way to program high-performance parallel computers. This success has occurred in spite of the view of many that message passing is difficult and that other approaches, including automatic parallelization and directive-based parallelism, are easier to use. This paper argues that MPI has succeeded because it addresses all of the important issues in providing a parallel programming model.Comment: 12 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

UNT Digital Library

Predictive runtime code scheduling for heterogeneous architectures

Author: A.S. Tanenbaum
A.S. Tanenbaum
D.J.C. Mackay
G. Fursin
G.C. Sih
H. Oh
H. Topcuoglu
H.A. Gabb
H.S. Stone
I. Gelado
M. Frigo
M. Maheswaran
P. Bellens
R.C. Whaley
R.M. Badia
S. Ryoo
Publication venue
Publication date: 01/01/2009
Field of study

Heterogeneous architectures are currently widespread. With the advent of easy-to-program general purpose GPUs, virtually every re- cent desktop computer is a heterogeneous system. Combining the CPU and the GPU brings great amounts of processing power. However, such architectures are often used in a restricted way for domain-speci c appli- cations like scienti c applications and games, and they tend to be used by a single application at a time. We envision future heterogeneous com- puting systems where all their heterogeneous resources are continuously utilized by di erent applications with versioned critical parts to be able to better adapt their behavior and improve execution time, power con- sumption, response time and other constraints at runtime. Under such a model, adaptive scheduling becomes a critical component. In this paper, we propose a novel predictive user-level scheduler based on past performance history for heterogeneous systems. We developed sev- eral scheduling policies and present the study of their impact on system performance. We demonstrate that such scheduler allows multiple appli- cations to fully utilize all available processing resources in CPU/GPU- like systems and consistently achieve speedups ranging from 30% to 40% compared to just using the GPU in a single application mode.Postprint (published version

HAL-CentraleSupelec

Crossref

UPCommons. Portal del coneixement obert de la UPC

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

Analytical Bounds for Optimal Tile Size Selection

Author: C. Hsu
J. Ferrante
J. Ramanujam
J. Xue
J.A. Nelder
M. Luersen
P. Boulet
P.M.W. Knijnenburg
R.C. Whaley
S. Ghosh
T.W. Barr
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Iridis-pi: a low-cost, compact demonstration cluster

Author: A.S. Szalay
D. Marquardt
D.G. Andersen
J. Dean
J.J. Dongarra
James T. Cox
K. Levenberg
K. Lim
K. Shvachko
Mark Scott
Neil S. O’Brien
Q.O. Snell
R.C. Whaley
Richard P. Boardman
S. Ghemawat
Simon J. Cox
Steven J. Johnston
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Degradação física de um Latossolo Vermelho utilizado para produção intensiva de forragem

Crossref

Adaptive Loop Tiling for a Multi-cluster CMP

Author: G. Fursin
J. Zhao
M.J. Wolfe
P. Kongetira
R.C. Whaley
R.C. Whaley
S. Coleman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

Exploring the optimization space of dense linear algebra kernels

Author: A. Qasem
P. Bientinesi
R.C. Whaley
Publication venue
Publication date: 01/01/2008
Field of study

Abstract. Dense linear algebra kernels such as matrix multiplication have been used as benchmarks to evaluate the effectiveness of many automated compiler optimizations. However, few studies have looked at collectively applying the transformations and parameterizing them for external search. In this paper, we take a detailed look at the optimization space of three dense linear algebra kernels. We use a transformation scripting language (POET) to implement each kernel-level optimization as applied by ATLAS. We then extensively parameterize these optimizations from the perspective of a general-purpose compiler and use a standalone empirical search engine to explore the optimization space using several different search strategies. Our exploration of the search space reveals key interaction among several transformations that must be considered by compilers to approach the level of efficiency obtained through manual tuning of kernels.

CiteSeerX

Crossref

GPU vs FPGA: A Comparative Analysis for Non-standard Precision

Author: F. Dinechin De
R. Dennard
R.C. Whaley
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Abstract. FPGAs and GPUs are increasingly used in a range of high performance computing applications. When implementing numerical al-gorithms on either platform, we can choose to represent operands with different levels of accuracy. A trade-off exists between the numerical ac-curacy of arithmetic operators and the resources needed to implement them. Where algorithmic requirements for numerical stability are cap-tured in a design description, this trade-off can be exploited to opti-mize performance by using high-accuracy operators only where they are most required. Support for half and double-double floating point repre-sentations allows additional flexibility to achieve this. The aim of this work is to study the language and hardware support, and the achievable peak performance for non-standard precisions on a GPU and an FPGA. A compute intensive program, matrix-matrix multiply, is selected as a benchmark and implemented for various different matrix sizes. The re-sults show that for large-enough matrices, GPUs out-perform FPGA-based implementations but for some smaller matrix sizes, specialized FPGA floating-point operators for half and double-double precision can deliver higher throughput than implementation on a GPU

CiteSeerX

Crossref