Search CORE

1,772 research outputs found

Developing performance-portable molecular dynamics kernels in Open CL

Author: Jarvis Stephen A.
Pennycook Simon J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2012
Field of study

This paper investigates the development of a molecular dynamics code that is highly portable between architectures. Using OpenCL, we develop an implementation of Sandia’s miniMD benchmark that achieves good levels of performance across a wide range of hardware: CPUs, discrete GPUs and integrated GPUs. We demonstrate that the performance bottlenecks of miniMD’s short-range force calculation kernel are the same across these architectures, and detail a number of platform- agnostic optimisations that improve its performance by at least 2x on all hardware considered. Our complete code is shown to be 1.7x faster than the original miniMD, and at most 2x slower than implementations individually hand-tuned for a specific architecture

Warwick Research Archives Portal Repository

The Formation and Stability of a Microbial Community

Author: Pennycook J
Publication venue: 'Japanese Society for Biological Sciences in Space'
Publication date: 04/11/2019
Field of study

New communities form regularly in nature, as many species rush to colonise a freshly formed island, pool, or microbiome, but it is unclear what rules govern the arrangement of these founders into a smaller, stable community, or whether the process is predictable. I simultaneously inoculated a master mix of bacterial colonisers into 45 identical environments, and allowed them to compete and evolve for around three months. By the end of the experiment, the species compositions of these communities had split into two broad groups, defined mostly by the mutual exclusivity of two Pseudomonas species, which may represent the ecological equivalence of the two species. Due to this functional similarity, I propose that community formation may be predictable at an ecological level, if not a taxonomic level. I also explored one of the communities formed in this experiment in further detail, investigating the maintenance of its diversity and stability. The community was fairly stable, as every species was able to persist even when it began at a much lower population size than its competitors, and no diversity was lost after 4 weeks of culture. I grew the species from this community in monoculture, as well as in every possible pair, triplet, and quartet, to fully assess the network of interactions, and found evidence for many significant higher-order interactions, which have been shown to have a stabilising effect in theoretical models

Open Research Exeter

Impurity Lattice and Sublattice Location by Electron Channeling

Author: Pennycook S. J.
Publication venue: DigitalCommons@USU
Publication date: 13/09/1987
Field of study

A new formulation is presented for the use of crystallographic orientation effects in electron scattering to determine impurity lattice location. The development of electron channeling techniques is reviewed and compared to high energy ion channeling and to the Borrmann effect in x-ray diffraction. The advantages of axial over planar geometry are discussed. Delocalization effects are more serious for quantitative analysis than have generally been believed. The new formulation applies to any crystal lattice and quantitatively includes delocalization effects via c-factors, which have been experimentally determined for diamond structure semiconductors. For sublattice site location this formulation removes the two major approximations of the original ALCHEMI formulation, namely that all the inner shell excitations are perfectly localized, and that all of the impurity atoms occupy distinct crystallographic sites. As an example, we study the location of small perfectly coherent Sb precipitates within the Si lattice

DigitalCommons@USU

WMTrace : a lightweight memory allocation tracker and analysis framework

Author: Hammond Simon D.
Jarvis Stephen A.
Pennycook Simon J.
Perks O. F. J.
Publication venue
Publication date: 01/07/2011
Field of study

The diverging gap between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The use of multi-core processor designs has, however, brought new problems to the design of memory architectures - increased core density without matched improvement in memory capacity is reduc- ing the available memory per parallel process. Multiple cores accessing memory simultaneously degrades performance as a result of resource con- tention for memory channels and physical DIMMs. These issues combine to ensure that memory remains an on-going challenge in the design of parallel algorithms which scale. In this paper we present WMTrace, a lightweight tool to trace and analyse memory allocation events in parallel applications. This tool is able to dynamically link to pre-existing application binaries requiring no source code modification or recompilation. A post-execution analysis stage enables in-depth analysis of traces to be performed allowing memory allocations to be analysed by time, size or function. The second half of this paper features a case study in which we apply WMTrace to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes

Warwick Research Archives Portal Repository

Evaluating the performance of legacy applications on emerging parallel architectures

Author: Pennycook Simon J.
Publication venue
Publication date
Field of study

The gap between a supercomputer's theoretical maximum (\peak") oatingpoint performance and that actually achieved by applications has grown wider over time. Today, a typical scientific application achieves only 5{20% of any given machine's peak processing capability, and this gap leaves room for significant improvements in execution times. This problem is most pronounced for modern \accelerator" architectures { collections of hundreds of simple, low-clocked cores capable of executing the same instruction on dozens of pieces of data simultaneously. This is a significant change from the low number of high-clocked cores found in traditional CPUs, and effective utilisation of accelerators typically requires extensive code and algorithmic changes. In many cases, the best way in which to map a parallel workload to these new architectures is unclear. The principle focus of the work presented in this thesis is the evaluation of emerging parallel architectures (specifically, modern CPUs, GPUs and Intel MIC) for two benchmark codes { the LU benchmark from the NAS Parallel Benchmark Suite and Sandia's miniMD benchmark { which exhibit complex parallel behaviours that are representative of many scientific applications. Using combinations of low-level intrinsic functions, OpenMP, CUDA and MPI, we demonstrate performance improvements of up to 7x for these workloads. We also detail a code development methodology that permits application developers to target multiple architecture types without maintaining completely separate implementations for each platform. Using OpenCL, we develop performance portable implementations of the LU and miniMD benchmarks that are faster than the original codes, and at most 2x slower than versions highly-tuned for particular hardware. Finally, we demonstrate the importance of evaluating architectures at scale (as opposed to on single nodes) through performance modelling techniques, highlighting the problems associated with strong-scaling on emerging accelerator architectures

Warwick Research Archives Portal Repository

Parallelising wavefront applications on general-purpose GPU devices

Author: Hammond Simon D.
Jarvis Stephen A.
Mudalige Gihan R.
Pennycook Simon J.
Publication venue: Performance Computing and Visualisation, Department of Computer Science, University of Warwick
Publication date: 01/07/2010
Field of study

Pipelined wavefront applications form a large portion of the high performance scientific computing workloads at supercomputing centres. This paper investigates the viability of graphics processing units (GPUs) for the acceleration of these codes, using NVIDIA's Compute Unified Device Architecture (CUDA). We identify the optimisations suitable for this new architecture and quantify the characteristics of those wavefront codes that are likely to experience speedups

Warwick Research Archives Portal Repository

Experiences with porting and modelling wavefront algorithms on many-core architectures

Author: Hammond Simon D.
Jarvis Stephen A.
Mudalige Gihan R.
Pennycook Simon J.
Publication venue
Publication date: 01/09/2010
Field of study

We are currently investigating the viability of many-core architectures for the acceleration of wavefront applications and this report focuses on graphics processing units (GPUs) in particular. To this end, we have implemented NASA’s LU benchmark – a real world production-grade application – on GPUs employing NVIDIA’s Compute Unified Device Architecture (CUDA). This GPU implementation of the benchmark has been used to investigate the performance of a selection of GPUs, ranging from workstation-grade commodity GPUs to the HPC "Tesla” and "Fermi” GPUs. We have also compared the performance of the GPU solution at scale to that of traditional high perfor- mance computing (HPC) clusters based on a range of multi- core CPUs from a number of major vendors, including Intel (Nehalem), AMD (Opteron) and IBM (PowerPC). In previous work we have developed a predictive “plug-and-play” performance model of this class of application running on such clusters, in which CPUs communicate via the Message Passing Interface (MPI). By extending this model to also capture the performance behaviour of GPUs, we are able to: (1) comment on the effects that architectural changes will have on the performance of single-GPU solutions, and (2) make projections regarding the performance of multi-GPU solutions at larger scale

Warwick Research Archives Portal Repository

Supercurrent through grain boundaries in the presence of strong correlations

Author: F. A. Wolf
F. Loder
J. K. Freericks
S. Graser
S. J. Pennycook
T. Kopp
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2012
Field of study

Strong correlations are known to severely reduce the mobility of charge carriers near half-filling and thus have an important influence on the current carrying properties of grain boundaries in the high-

T_c

cuprates. In this work we present an extension of the Gutzwiller projection approach to treat electronic correlations below as well as above half-filling consistently. We apply this method to investigate the critical current through grain boundaries with a wide range of misalignment angles for electron- and hole-doped systems. For the latter excellent agreement with experimental data is found. We further provide a detailed comparison to an analogous weak-coupling evaluation.Comment: 4 pages, 3 figure

arXiv.org e-Print Archive

OPUS Augsburg

Crossref