129,652 research outputs found
A compiler extension for parallelizing arrays automatically on the cell heterogeneous processor
This paper describes the approaches taken to extend an array
programming language compiler using a Virtual SIMD Machine (VSM)
model for parallelizing array operations on Cell Broadband Engine heterogeneous
machine. This development is part of ongoing work at the
University of Glasgow for developing array compilers that are beneficial
for applications in many areas such as graphics, multimedia, image processing
and scientific computation. Our extended compiler, which is built
upon the VSM interface, eases the parallelization processes by allowing
automatic parallelisation without the need for any annotations or process
directives. The preliminary results demonstrate significant improvement
especially on data-intensive applications
C++ programming language for an abstract massively parallel SIMD architecture
The aim of this work is to define and implement an extended C++ language to
support the SIMD programming paradigm. The C++ programming language has been
extended to express all the potentiality of an abstract SIMD machine consisting
of a central Control Processor and a N-dimensional toroidal array of Numeric
Processors. Very few extensions have been added to the standard C++ with the
goal of minimising the effort for the programmer in learning a new language and
to keep very high the performance of the compiled code. The proposed language
has been implemented as a porting of the GNU C++ Compiler on a SIMD
supercomputer.Comment: 10 page
Handling Parallelism in a Concurrency Model
Programming models for concurrency are optimized for dealing with
nondeterminism, for example to handle asynchronously arriving events. To shield
the developer from data race errors effectively, such models may prevent shared
access to data altogether. However, this restriction also makes them unsuitable
for applications that require data parallelism. We present a library-based
approach for permitting parallel access to arrays while preserving the safety
guarantees of the original model. When applied to SCOOP, an object-oriented
concurrency model, the approach exhibits a negligible performance overhead
compared to ordinary threaded implementations of two parallel benchmark
programs.Comment: MUSEPAT 201
A study of systems implementation languages for the POCCNET system
The results are presented of a study of systems implementation languages for the Payload Operations Control Center Network (POCCNET). Criteria are developed for evaluating the languages, and fifteen existing languages are evaluated on the basis of these criteria
DART-MPI: An MPI-based Implementation of a PGAS Runtime System
A Partitioned Global Address Space (PGAS) approach treats a distributed
system as if the memory were shared on a global level. Given such a global view
on memory, the user may program applications very much like shared memory
systems. This greatly simplifies the tasks of developing parallel applications,
because no explicit communication has to be specified in the program for data
exchange between different computing nodes. In this paper we present DART, a
runtime environment, which implements the PGAS paradigm on large-scale
high-performance computing clusters. A specific feature of our implementation
is the use of one-sided communication of the Message Passing Interface (MPI)
version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated
the performance of the implementation with several low-level kernels in order
to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address
Space Programming Models (PGAS14
Group Communication Patterns for High Performance Computing in Scala
We developed a Functional object-oriented Parallel framework (FooPar) for
high-level high-performance computing in Scala. Central to this framework are
Distributed Memory Parallel Data structures (DPDs), i.e., collections of data
distributed in a shared nothing system together with parallel operations on
these data. In this paper, we first present FooPar's architecture and the idea
of DPDs and group communications. Then, we show how DPDs can be implemented
elegantly and efficiently in Scala based on the Traversable/Builder pattern,
unifying Functional and Object-Oriented Programming. We prove the correctness
and safety of one communication algorithm and show how specification testing
(via ScalaCheck) can be used to bridge the gap between proof and
implementation. Furthermore, we show that the group communication operations of
FooPar outperform those of the MPJ Express open source MPI-bindings for Java,
both asymptotically and empirically. FooPar has already been shown to be
capable of achieving close-to-optimal performance for dense matrix-matrix
multiplication via JNI. In this article, we present results on a parallel
implementation of the Floyd-Warshall algorithm in FooPar, achieving more than
94 % efficiency compared to the serial version on a cluster using 100 cores for
matrices of dimension 38000 x 38000
TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
Memory consistency models (MCMs) which govern inter-module interactions in a
shared memory system, are a significant, yet often under-appreciated, aspect of
system design. MCMs are defined at the various layers of the hardware-software
stack, requiring thoroughly verified specifications, compilers, and
implementations at the interfaces between layers. Current verification
techniques evaluate segments of the system stack in isolation, such as proving
compiler mappings from a high-level language (HLL) to an ISA or proving
validity of a microarchitectural implementation of an ISA.
This paper makes a case for full-stack MCM verification and provides a
toolflow, TriCheck, capable of verifying that the HLL, compiler, ISA, and
implementation collectively uphold MCM requirements. The work showcases
TriCheck's ability to evaluate a proposed ISA MCM in order to ensure that each
layer and each mapping is correct and complete. Specifically, we apply TriCheck
to the open source RISC-V ISA, seeking to verify accurate, efficient, and legal
compilations from C11. We uncover under-specifications and potential
inefficiencies in the current RISC-V ISA documentation and identify possible
solutions for each. As an example, we find that a RISC-V-compliant
microarchitecture allows 144 outcomes forbidden by C11 to be observed out of
1,701 litmus tests examined. Overall, this paper demonstrates the necessity of
full-stack verification for detecting MCM-related bugs in the hardware-software
stack.Comment: Proceedings of the Twenty-Second International Conference on
Architectural Support for Programming Languages and Operating System
- …