Search CORE

4 research outputs found

Performance of the Cell processor for biomolecular simulations

Author: Allen
G. De Fabritiis
Humphrey
Kistler
MacKerell
Phillips
Publication venue: 'Elsevier BV'
Publication date: 01/03/2007
Field of study

The new Cell processor represents a turning point for computing intensive applications. Here, I show that for molecular dynamics it is possible to reach an impressive sustained performance in excess of 30 Gflops with a peak of 45 Gflops for the non-bonded force calculations, over one order of magnitude faster than a single core standard processor

arXiv.org e-Print Archive

Crossref

The impact of accelerator processors for high-throughput molecular modeling and simulation

Author: Bajorath
Collin
De Fabritiis
De Fabritiis
De Fabritiis
Dickson
Ekins
Fowler
G. De Fabritiis
G. Giupponi
Gohlke
Humphrey
Jorgensen
Ladbury
Litzkow
M.J. Harvey
Manavski
Moore
Nightingale
Phillips
Shaw
Smit
Stjernschantz
Stone
Susukita
Swaan
Ufimtsev
van Meel
Zaho
Publication venue: 'Elsevier BV'
Publication date: 01/12/2008
Field of study

Accepted versio

Crossref

Spiral - Imperial College Digital Repository

Profile-directed specialisation of custom floating-point hardware

Author: Brown Ashley W.
Brown Ashley W.
Publication venue: Computing, Imperial College London
Publication date: 01/05/2010
Field of study

We present a methodology for generating floating-point arithmetic hardware designs which are, for suitable applications, much reduced in size, while still retaining performance and IEEE-754 compliance. Our system uses three key parts: a profiling tool, a set of customisable floating-point units and a selection of system integration methods. We use a profiling tool for floating-point behaviour to identify arithmetic operations where fundamental elements of IEEE-754 floating-point may be compromised, without generating erroneous results in the common case. In the uncommon case, we use simple detection logic to determine when operands lie outside the range of capabilities of the optimised hardware. Out-of-range operations are handled by a separate, fully capable, floatingpoint implementation, either on-chip or by returning calculations to a host processor. We present methods of system integration to achieve this errorcorrection. Thus the system suffers no compromise in IEEE-754 compliance, even when the synthesised hardware would generate erroneous results. In particular, we identify from input operands the shift amounts required for input operand alignment and post-operation normalisation. For operations where these are small, we synthesise hardware with reduced-size barrel-shifters. We also propose optimisations to take advantage of other profile-exposed behaviours, including removing the hardware required to swap operands in a floating-point adder or subtractor, and reducing the exponent range to fit observed values. We present profiling results for a range of applications, including a selection of computational science programs, Spec FP 95 benchmarks and the FFMPEG media processing tool, indicating which would be amenable to our method. Selected applications which demonstrate potential for optimisation are then taken through to a hardware implementation. We show up to a 45% decrease in hardware size for a floating-point datapath, with a correctable error-rate of less then 3%, even with non-profiled datasets

Spiral - Imperial College Digital Repository

Solving Hyperbolic PDEs using Accelerator Architectures

Author: Rostrup Scott
Publication venue: 'University of Waterloo'
Publication date: 15/07/2009
Field of study

Accelerator architectures are used to accelerate the simulation of nonlinear hyperbolic PDEs. Three different architectures, a multicore CPU using threading, IBM’s Cell Processor, and Nvidia’s Tesla GPUs are investigated. Speed-ups of between 40-75× relative to a single CPU core in single precision are obtained using the Cell processor and the GPU. The three implementations are extended to parallel computing clusters by making use of the Message Passing Interface (MPI). The resulting hybrid-parallel code is investigated for performance and scalability on both a GPU and Cell computing cluster

University of Waterloo's Institutional Repository