Search CORE

4,642 research outputs found

GPU Concurrency: Weak Behaviours and Programming Assumptions

Author: AMD.
AMD.
AMD.
Cederman D.
Cederman D.
Collier W.
Core ARM.
Feng W.
Hower D. R.
Hwu W.-m. W.
Khronos OpenCL Working Group
Sanders J.
Sorensen T.
Southern AMD.
Stuart J. A.
Weaver D. L.
Xiao S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/11/2014
Field of study

Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software. To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent programs), we questioned the assumptions in programming guides and vendor documentation about the guarantees provided by hardware. We developed a tool to generate thousands of litmus tests and run them under stressful workloads. We observed a litany of previously elusive weak behaviours, and exposed folklore beliefs about GPU programming---often supported by official tutorials---as false. As a way forward, we propose a model of Nvidia GPU hardware, which correctly models every behaviour witnessed in our experiments. The model is a variant of SPARC Relaxed Memory Order (RMO), structured following the GPU concurrency hierarchy

Crossref

Oxford University Research Archive

Kent Academic Repository

Spiral - Imperial College Digital Repository

Automated Analysis of Weak Memory Models

Author: Yushkovskiy Artem
Publication venue
Publication date: 20/08/2018
Field of study

Software verification is considered to be a hard computational problem vulnerable to the state explosion problem. Concurrent software verification raises the complexity of the problem to a power determined by all the possible interleavings of states of the system. Moreover, the architecture of a modern shared-memory multi-core processor and optimisations performed by a compiler can cause program behaviour that is unexpected from the point of view of traditional concurrency. The guarantees that an execution environment can provide to a programmer are formalised in its Weak Memory Model (WMM). Over the last decade, weak memory models were defined for multiple hardware architectures and programming languages. This opens new challenges in software verification with respect to a weak memory model. Most existing tools that perform memory model-aware software analysis tools examine behaviours of the program against a single memory model. The first tool that analyses the portability of a concurrent program from one platform to another is Porthos [PFH+17a] released in April 2017. Porthos can verify that the program is portable from the source platform S to the target platform T by checking that the program has no extra states under T. For that, it performs an SMT-based bounded reachability analysis by encoding the constraints of the program and two memory models M_S and M_T into a single SMT-formula. Although the approach has been proven to be efficient, the tool accepts as input the small C-like toy language. Current thesis aims to rework Porthos by extending its input language, so that it is able to process real-world C programs. However, current implementation of Porthos can be considered hardly extensible, which raises the need to redesign its whole architecture in order to increase the robustness, transparency, efficiency and extensibility. The result of the work is PorthosC, a framework for SMT-based memory model-aware analysis

Aaltodoc Publication Archive

Remote-scope Promotion: Clarified, Rectified, and Verified

Author: Cederman D.
Kyriazis G.
Munshi A.
Nipkow T.
Wickerson J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2015
Field of study

Modern accelerator programming frameworks, such as OpenCL, organise threads into work-groups. Remote-scope promotion (RSP) is a language extension recently proposed by AMD researchers that is designed to enable applications, for the first time, both to optimise for the common case of intra-work-group communication (using memory scopes to provide consistency only within a work-group) and to allow occasional inter-work-group communication (as required, for instance, to support the popular load-balancing idiom of work stealing). We present the first formal, axiomatic memory model of OpenCL extended with RSP. We have extended the Herd memory model simulator with support for OpenCL kernels that exploit RSP, and used it to discover bugs in several litmus tests and a work-stealing queue, that have been used previously in the study of RSP. We have also formalised the proposed GPU implementation of RSP. The formalisation process allowed us to identify bugs in the description of RSP that could result in well-synchronised programs experiencing memory inconsistencies. We present and prove sound a new implementation of RSP that incorporates bug fixes and requires less non-standard hardware than the original implementation. This work, a collaboration between academia and industry, clearly demonstrates how, when designing hardware support for a new concurrent language feature, the early application of formal tools and techniques can help to prevent errors, such as those we have found, from making it into silicon

CiteSeerX

Crossref

Kent Academic Repository

Spiral - Imperial College Digital Repository

An Abstract Machine for Unification Grammars

Author: Wintner Shuly
Publication venue
Publication date: 01/01/1997
Field of study

This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives a well-defined operational semantics to the grammatical formalism. This ensures that grammars specified using our system are endowed with well defined meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given an independent definition. From a practical point of view, Amalia is the first system that employs a direct compilation scheme for unification grammars that are based on typed feature structures. The use of amalia results in a much improved performance over existing systems. In order to test the machine on a realistic application, we have developed a small-scale, HPSG-based grammar for a fragment of the Hebrew language, using Amalia as the development platform. This is the first application of HPSG to a Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros fil

arXiv.org e-Print Archive

CiteSeerX

Architectural Refinement in HETS

Author: Codescu Mihai
Publication venue
Publication date: 01/01/2012
Field of study

The main objective of this work is to bring a number of improvements to the Heterogeneous Tool Set HETS, both from a theoretical and an implementation point of view. In the first part of the thesis we present a number of recent extensions of the tool, among which declarative specifications of logics, generalized theoroidal comorphisms, heterogeneous colimits and integration of the logic of the term rewriting system Maude. In the second part we concentrate on the CASL architectural refinement language, that we equip with a notion of refinement tree and with calculi for checking correctness and consistency of refinements. Soundness and completeness of these calculi is also investigated. Finally, we present the integration of the VSE refinement method in HETS as an institution comorphism. Thus, the proof manangement component of HETS remains unmodified

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen

Isla: Integrating full-scale ISA semantics and axiomatic concurrency models (extended version)

Author: Armstrong Alasdair
Campbell Brian
Pulte Christopher
Sewell Peter
Simner Ben
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/05/2023
Field of study

Edinburgh Research Explorer