Search CORE

88,465 research outputs found

Recommended from our members

Incremental closure for systems of two variables per inequality

Author: Howe J. M.
King A.
Simon A.
Publication venue: 'Elsevier BV'
Publication date: 05/12/2018
Field of study

Subclasses of linear inequalities where each inequality has at most two vari- ables are popular in abstract interpretation and model checking, because they strike a balance between what can be described and what can be efficiently computed. This paper focuses on the TVPI class of inequalities, for which each coefficient of each two variable inequality is unrestricted. An implied TVPI in- equality can be generated from a pair of TVPI inequalities by eliminating a given common variable (echoing resolution on clauses). This operation, called result , can be applied to derive TVPI inequalities which are entailed (implied) by a given TVPI system. The key operation on TVPI is calculating closure: satisfiability can be observed from a closed system and a closed system also simplifies the calculation of other operations. A closed system can be derived by repeatedly applying the result operator. The process of adding a single TVPI inequality to an already closed input TVPI system and then finding the closure of this augmented system is called incremental closure. This too can be calcu- lated by the repeated application of the result operator. This paper studies the calculus defined by result , the structure of result derivations, and how deriva- tions can be combined and controlled. A series of lemmata on derivations are presented that, collectively, provide a pathway for synthesising an algorithm for incremental closure. The complexity of the incremental closure algorithm is analysed and found to be O (( n 2 + m 2 )lg( m )), where n is the number of variables and m the number of inequalities of the input TVPI system

City Research Online

Kent Academic Repository

An investigation of the performance portability of OpenCL

Author: Hammond Simon D.
Herdman J. A.
Jarvis Stephen A.
Miller I.
Pennycook Simon J.
Wright Steven A.
Publication venue: 'Elsevier BV'
Publication date: 11/08/2012
Field of study

This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The resulting platform-agnostic, single source application is benchmarked on a number of different architectures, and is shown to be 1.3–1.5× slower than native FORTRAN 77 or CUDA implementations on a single node and 1.3–3.1× slower on multiple nodes. We also explore the potential performance gains of OpenCL’s device fissioning capability, demonstrating up to a 3× speed-up over our original OpenCL implementation

Warwick Research Archives Portal Repository

Study of laser deposited thin films Final report, 4 May 1967 - 4 May 1968

Author: Simon J. A. R.
Publication venue
Publication date
Field of study

Feasibility of laser deposited metal films for mirror productio

NASA Technical Reports Server

WMTrace : a lightweight memory allocation tracker and analysis framework

Author: Hammond Simon D.
Jarvis Stephen A.
Pennycook Simon J.
Perks O. F. J.
Publication venue
Publication date: 01/07/2011
Field of study

The diverging gap between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The use of multi-core processor designs has, however, brought new problems to the design of memory architectures - increased core density without matched improvement in memory capacity is reduc- ing the available memory per parallel process. Multiple cores accessing memory simultaneously degrades performance as a result of resource con- tention for memory channels and physical DIMMs. These issues combine to ensure that memory remains an on-going challenge in the design of parallel algorithms which scale. In this paper we present WMTrace, a lightweight tool to trace and analyse memory allocation events in parallel applications. This tool is able to dynamically link to pre-existing application binaries requiring no source code modification or recompilation. A post-execution analysis stage enables in-depth analysis of traces to be performed allowing memory allocations to be analysed by time, size or function. The second half of this paper features a case study in which we apply WMTrace to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes

Warwick Research Archives Portal Repository

New low-mass members of the Octans stellar association and an updated 30-40 Myr lithium age

Author: Lawson Warrick A.
Murphy Simon J.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2014
Field of study

The Octans association is one of several young stellar moving groups recently discovered in the Solar neighbourhood, and hence a valuable laboratory for studies of stellar, circumstellar disc and planetary evolution. However, a lack of low-mass members or any members with trigonometric parallaxes means the age, distance and space motion of the group are poorly constrained. To better determine its membership and age, we present the first spectroscopic survey for new K and M-type Octans members, resulting in the discovery of 29 UV-bright K5-M4 stars with kinematics, photometry and distances consistent with existing members. Nine new members possess strong Li I absorption, which allow us to estimate a lithium age of 30-40 Myr, similar to that of the Tucana-Horologium association and bracketed by the firm lithium depletion boundary ages of the Beta Pictoris (20 Myr) and Argus/IC 2391 (50 Myr) associations. Several stars also show hints in our medium-resolution spectra of fast rotation or spectroscopic binarity. More so than other nearby associations, Octans is much larger than its age and internal velocity dispersion imply. It may be the dispersing remnant of a sparse, extended structure which includes some younger members of the foreground Octans-Near association recently proposed by Zuckerman and collaborators.Comment: Accepted for publication in MNRAS (16 pages, 5 tables

arXiv.org e-Print Archive

CiteSeerX

Developing performance-portable molecular dynamics kernels in Open CL

Author: Jarvis Stephen A.
Pennycook Simon J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2012
Field of study

This paper investigates the development of a molecular dynamics code that is highly portable between architectures. Using OpenCL, we develop an implementation of Sandia’s miniMD benchmark that achieves good levels of performance across a wide range of hardware: CPUs, discrete GPUs and integrated GPUs. We demonstrate that the performance bottlenecks of miniMD’s short-range force calculation kernel are the same across these architectures, and detail a number of platform- agnostic optimisations that improve its performance by at least 2x on all hardware considered. Our complete code is shown to be 1.7x faster than the original miniMD, and at most 2x slower than implementations individually hand-tuned for a specific architecture

Warwick Research Archives Portal Repository