91,252 research outputs found
Recommended from our members
Incremental closure for systems of two variables per inequality
Subclasses of linear inequalities where each inequality has at most two vari- ables are popular in abstract interpretation and model checking, because they strike a balance between what can be described and what can be efficiently computed. This paper focuses on the TVPI class of inequalities, for which each coefficient of each two variable inequality is unrestricted. An implied TVPI in- equality can be generated from a pair of TVPI inequalities by eliminating a given common variable (echoing resolution on clauses). This operation, called result , can be applied to derive TVPI inequalities which are entailed (implied) by a given TVPI system. The key operation on TVPI is calculating closure: satisfiability can be observed from a closed system and a closed system also simplifies the calculation of other operations. A closed system can be derived by repeatedly applying the result operator. The process of adding a single TVPI inequality to an already closed input TVPI system and then finding the closure of this augmented system is called incremental closure. This too can be calcu- lated by the repeated application of the result operator. This paper studies the calculus defined by result , the structure of result derivations, and how deriva- tions can be combined and controlled. A series of lemmata on derivations are presented that, collectively, provide a pathway for synthesising an algorithm for incremental closure. The complexity of the incremental closure algorithm is analysed and found to be O (( n 2 + m 2 )lg( m )), where n is the number of variables and m the number of inequalities of the input TVPI system
An investigation of the performance portability of OpenCL
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The resulting platform-agnostic, single source application is benchmarked on a number of different architectures, and is shown to be 1.3–1.5× slower than native FORTRAN 77 or CUDA implementations on a single node and 1.3–3.1× slower on multiple nodes. We also explore the potential performance gains of OpenCL’s device fissioning capability, demonstrating up to a 3× speed-up over our original OpenCL implementation
Study of laser deposited thin films Final report, 4 May 1967 - 4 May 1968
Feasibility of laser deposited metal films for mirror productio
WMTrace : a lightweight memory allocation tracker and analysis framework
The diverging gap between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The use of multi-core processor designs has, however, brought new problems to the design of memory architectures - increased core density without matched improvement in memory capacity is reduc- ing the available memory per parallel process. Multiple cores accessing memory simultaneously degrades performance as a result of resource con- tention for memory channels and physical DIMMs. These issues combine to ensure that memory remains an on-going challenge in the design of parallel algorithms which scale. In this paper we present WMTrace, a lightweight tool to trace and analyse memory allocation events in parallel applications. This tool is able to dynamically link to pre-existing application binaries requiring no source code modification or recompilation. A post-execution analysis stage enables in-depth analysis of traces to be performed allowing memory allocations to be analysed by time, size or function. The second half of this paper features a case study in which we apply WMTrace to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes
New low-mass members of the Octans stellar association and an updated 30-40 Myr lithium age
The Octans association is one of several young stellar moving groups recently
discovered in the Solar neighbourhood, and hence a valuable laboratory for
studies of stellar, circumstellar disc and planetary evolution. However, a lack
of low-mass members or any members with trigonometric parallaxes means the age,
distance and space motion of the group are poorly constrained. To better
determine its membership and age, we present the first spectroscopic survey for
new K and M-type Octans members, resulting in the discovery of 29 UV-bright
K5-M4 stars with kinematics, photometry and distances consistent with existing
members. Nine new members possess strong Li I absorption, which allow us to
estimate a lithium age of 30-40 Myr, similar to that of the Tucana-Horologium
association and bracketed by the firm lithium depletion boundary ages of the
Beta Pictoris (20 Myr) and Argus/IC 2391 (50 Myr) associations. Several stars
also show hints in our medium-resolution spectra of fast rotation or
spectroscopic binarity. More so than other nearby associations, Octans is much
larger than its age and internal velocity dispersion imply. It may be the
dispersing remnant of a sparse, extended structure which includes some younger
members of the foreground Octans-Near association recently proposed by
Zuckerman and collaborators.Comment: Accepted for publication in MNRAS (16 pages, 5 tables
Developing performance-portable molecular dynamics kernels in Open CL
This paper investigates the development of a molecular dynamics code that is highly portable between architectures. Using OpenCL, we develop an implementation of Sandia’s miniMD benchmark that achieves good levels of performance across a wide range of hardware: CPUs, discrete GPUs and integrated GPUs.
We demonstrate that the performance bottlenecks of miniMD’s short-range force calculation kernel are the same across these architectures, and detail a number of platform- agnostic optimisations that improve its performance by at least 2x on all hardware considered. Our complete code is shown to be 1.7x faster than the original miniMD, and at most 2x slower than implementations individually hand-tuned for a specific architecture
Benefits of Session Types for software Development
Session types are a formalism used to specify and check the correctness of communication based systems. Within their scope, they can guarantee the absence of communication errors such as deadlock, sending an unexpected message or failing to handle an incoming message. Introduced over two decades ago, they have developed into a significant theme in programming languages. In this paper we examine the beliefs that drive research into this area and make it popular. We look at the claims and motivation behind session types throughout the literature. We identify the hypotheses upon which session types have been designed and implemented, and attempt to clarify and formulate them in a more suitable manner for testing
- …