Search CORE

13,020 research outputs found

Modularizing and Specifying Protocols among Threads

Author: Arbab Farhad
Jongmans Sung-Shik T. Q.
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2013
Field of study

We identify three problems with current techniques for implementing protocols among threads, which complicate and impair the scalability of multicore software development: implementing synchronization, implementing coordination, and modularizing protocols. To mend these deficiencies, we argue for the use of domain-specific languages (DSL) based on existing models of concurrency. To demonstrate the feasibility of this proposal, we explain how to use the model of concurrency Reo as a high-level protocol DSL, which offers appropriate abstractions and a natural separation of protocols and computations. We describe a Reo-to-Java compiler and illustrate its use through examples.Comment: In Proceedings PLACES 2012, arXiv:1302.579

arXiv.org e-Print Archive

Directory of Open Access Journals

Compiling vector pascal to the XeonPhi

Author: Bik
Budd
Chamberlain
Cockshott
Cockshott
Ewing
Grelck
Iverson
Keßler
Krishnaiyer
Lin
Pater
Perrott
Perrott
Scholz
Siebert
Snyder
Tousimojarad
Publication venue: 'Wiley'
Publication date: 26/03/2015
Field of study

Intel's XeonPhi is a highly parallel x86 architecture chip made by Intel. It has a number of novel features which make it a particularly challenging target for the compiler writer. This paper describes the techniques used to port the Glasgow Vector Pascal Compiler to this architecture and assess its performance by comparisons of the XeonPhi with 3 other machines running the same algorithms

Enlighten: Research Data (University of Glasgow)

Crossref

Enlighten

Remote-scope Promotion: Clarified, Rectified, and Verified

Author: Cederman D.
Kyriazis G.
Munshi A.
Nipkow T.
Wickerson J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2015
Field of study

Modern accelerator programming frameworks, such as OpenCL, organise threads into work-groups. Remote-scope promotion (RSP) is a language extension recently proposed by AMD researchers that is designed to enable applications, for the first time, both to optimise for the common case of intra-work-group communication (using memory scopes to provide consistency only within a work-group) and to allow occasional inter-work-group communication (as required, for instance, to support the popular load-balancing idiom of work stealing). We present the first formal, axiomatic memory model of OpenCL extended with RSP. We have extended the Herd memory model simulator with support for OpenCL kernels that exploit RSP, and used it to discover bugs in several litmus tests and a work-stealing queue, that have been used previously in the study of RSP. We have also formalised the proposed GPU implementation of RSP. The formalisation process allowed us to identify bugs in the description of RSP that could result in well-synchronised programs experiencing memory inconsistencies. We present and prove sound a new implementation of RSP that incorporates bug fixes and requires less non-standard hardware than the original implementation. This work, a collaboration between academia and industry, clearly demonstrates how, when designing hardware support for a new concurrent language feature, the early application of formal tools and techniques can help to prevent errors, such as those we have found, from making it into silicon

CiteSeerX

Crossref

Kent Academic Repository

Spiral - Imperial College Digital Repository

Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs

Author: Alvanos Michail
Amaral José Nelson
Farreras Esclusa Montserrat
Martorell Bofill Xavier
Tiotto Ettore
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Programs written in the Unified Parallel C (UPC) language can access any location of the entire local and remote address space via read/write operations. However, UPC programs that contain fine-grained shared accesses can exhibit performance degradation. One solution is to use the inspector-executor technique to coalesce fine-grained shared accesses to larger remote access operations. A straightforward implementation of the inspector executor transformation results in excessive instrumentation that hinders performance.; This paper addresses this issue and introduces various techniques that aim at reducing the generated instrumentation code: a shared-data localization transformation based on Constant-Stride Linear Memory Descriptors (CSLMADs) [S. Aarseth, Gravitational N-Body Simulations: Tools and Algorithms, Cambridge Monographs on Mathematical Physics, Cambridge University Press, 2003.], the inlining of data locality checks and the usage of an index vector to aggregate the data. Finally, the paper introduces a lightweight loop code motion transformation to privatize shared scalars that were propagated through the loop body.; A performance evaluation, using up to 2048 cores of a POWER 775, explores the impact of each optimization and characterizes the overheads of UPC programs. It also shows that the presented optimizations increase performance of UPC programs up to 1.8 x their UPC hand-optimized counterpart for applications with regular accesses and up to 6.3 x for applications with irregular accesses.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Continuation-Passing C: compiling threads to events through continuations

Author: A. Adya
A. Dunkels
A. Fischbach
A. Wijngaarden van
A.W. Appel
C. Bruggeman
C. Tismer
C.A.R. Hoare
C.P. Wadsworth
C.T. Haynes
F. Boussinot
G. Kerneis
G. Necula
G.D. Plotkin
Gabriel Kerneis
H. Thielecke
J. Berdine
J. Fischer
J. Reppy
J. Vouillon
J.C. Reynolds
Juliusz Chroboczek
K. Claessen
M. Krohn
M. Wand
M. Welsh
O. Danvy
P. Haller
P. Li
P.J. Landin
R. Behren von
R.K. Dybvig
R.S. Engelschall
S. Srinivasan
S.E. Ganz
T. Harris
T. Johnsson
T. Rompf
V.S. Pai
W.D. Clinger
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2011
Field of study

In this paper, we introduce Continuation Passing C (CPC), a programming language for concurrent systems in which native and cooperative threads are unified and presented to the programmer as a single abstraction. The CPC compiler uses a compilation technique, based on the CPS transform, that yields efficient code and an extremely lightweight representation for contexts. We provide a proof of the correctness of our compilation scheme. We show in particular that lambda-lifting, a common compilation technique for functional languages, is also correct in an imperative language like C, under some conditions enforced by the CPC compiler. The current CPC compiler is mature enough to write substantial programs such as Hekate, a highly concurrent BitTorrent seeder. Our benchmark results show that CPC is as efficient, while using significantly less space, as the most efficient thread libraries available.Comment: Higher-Order and Symbolic Computation (2012). arXiv admin note: substantial text overlap with arXiv:1202.324

arXiv.org e-Print Archive

Crossref

Hal-Diderot