7,365 research outputs found
Compiling vector pascal to the XeonPhi
Intel's XeonPhi is a highly parallel x86 architecture chip made by Intel. It has a number of novel features which make it a particularly challenging target for the compiler writer. This paper describes the techniques used to port the Glasgow Vector Pascal Compiler to this architecture and assess its performance by comparisons of the XeonPhi with 3 other machines running the same algorithms
Parallel String Sample Sort
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we propose string sample sort. The algorithm makes effective use of
the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Additionally, we parallelize variants of multikey
quicksort and radix sort that are also useful in certain situations.Comment: 34 pages, 7 figures and 12 table
Modula-2*: An extension of Modula-2 for highly parallel programs
Parallel programs should be machine-independent, i.e., independent of properties that are likely to differ from one parallel computer to the next. Extensions are described of Modula-2 for writing highly parallel, portable programs meeting these requirements. The extensions are: synchronous and asynchronous forms of forall statement; and control of the allocation of data to processors. Sample programs written with the extensions demonstrate the clarity of parallel programs when machine-dependent details are omitted. The principles of efficiently implementing the extensions on SIMD, MIMD, and MSIMD machines are discussed. The extensions are small enough to be integrated easily into other imperative languages
Array languages and the N-body problem
This paper is a description of the contributions to the SICSA multicore challenge on many body
planetary simulation made by a compiler group at the University of Glasgow. Our group is part of
the Computer Vision and Graphics research group and we have for some years been developing array
compilers because we think these are a good tool both for expressing graphics algorithms and for
exploiting the parallelism that computer vision applications require.
We shall describe experiments using two languages on two different platforms and we shall compare
the performance of these with reference C implementations running on the same platforms. Finally
we shall draw conclusions both about the viability of the array language approach as compared to
other approaches used in the challenge and also about the strengths and weaknesses of the two, very
different, processor architectures we used
A compiler extension for parallelizing arrays automatically on the cell heterogeneous processor
This paper describes the approaches taken to extend an array
programming language compiler using a Virtual SIMD Machine (VSM)
model for parallelizing array operations on Cell Broadband Engine heterogeneous
machine. This development is part of ongoing work at the
University of Glasgow for developing array compilers that are beneficial
for applications in many areas such as graphics, multimedia, image processing
and scientific computation. Our extended compiler, which is built
upon the VSM interface, eases the parallelization processes by allowing
automatic parallelisation without the need for any annotations or process
directives. The preliminary results demonstrate significant improvement
especially on data-intensive applications
Engineering Parallel String Sorting
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we first propose string sample sort. The algorithm makes effective use
of the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Then we focus on NUMA architectures, and develop
parallel multiway LCP-merge and -mergesort to reduce the number of random
memory accesses to remote nodes. Additionally, we parallelize variants of
multikey quicksort and radix sort that are also useful in certain situations.
Comprehensive experiments on five current multi-core platforms are then
reported and discussed. The experiments show that our implementations scale
very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115
- …